Non-native text analysis: A survey

نویسندگان

  • Sean Massung
  • ChengXiang Zhai
چکیده

Non-native speakers of English far outnumber native speakers; English is the main language of books, newspapers, airports, air-traffic control, international business, academic conferences, science, technology, diplomacy, sports, international competitions, pop music, and advertising (British Council 2014). Online education in the form of massive online open courses is also primarily in English—even teaching English. This creates enormous amounts of text written by non-native speakers, which in turn generates a need for grammar correction and analysis. Even aside from massive online open courses, the number of English learners in Asia alone is in the tens of millions. In this paper, we provide a survey of the two main areas of existing work on non-native text analysis, prefaced by an overview of common datasets used by researchers, comparing their attributes and potential uses. Then, an introduction to native language identification follows: determining the native language of an author based on text in the second language. This section is subdivided into various techniques and a shared task on this classification problem. Next, we discuss non-native grammatical error correction—finding and modifying text to fix errors or to make it sound more fluent. Again, we discuss different methods before investigating a relevant shared task. Lastly, we end with conclusions and potential future directions. While this survey primarily focuses on detecting and correcting non-native English text, many approaches are general and can be used across any language pairing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contrastive Analysis of Metadiscourse Markers Used by Non-native (Iranians) vs. Native (Americans) Speakers in Developing ELT Materials

Metadiscourse is a widely used term in current discourse analysis and language education, referring to an interesting, and relatively new approach to conceptualizing interaction between text producers and their texts and between text producers and users. Despite the growing importance of the term, however, it is often understood in different ways and used to refer to different aspects of langua...

متن کامل

Clause Complexity in Applied Linguistics Research Article Abstracts by Native and Non-Native English Writers: Taxis, Expansion and Projection

Halliday’s Systemic Functional Linguistics (SFL) has stood the test of time as a model of text analysis. The present literature contains a plethora of studies that while taking the ‘clause’ as a unit of analysis have put into investigation the metafunctions in research articles of a single field of study or those of various fields in comparison. Although ‘clause complex’ is another unit of SF a...

متن کامل

Native and Non-native Use of Lexical Bundles in Discussion Section of Political Science Articles

The study of lexical bundles, among types of text analysis, is gaining importance over the others in the last century. The present study employed a frequency-based analysis approach to the use of lexical bundles. The discussion section of 60 political science articles, with corpora around 253,063 words were investigated in three aspects of structure, form, and function of lexical bundles. The p...

متن کامل

Metadiscourse Elements in English Research Articles Written by Native English and Non-native Iranian Writers in Applied Linguistics and Civil Engineering

This study investigated metadiscourse and its subcategories in English research articles (RAs) written by nonnative (Iranian) and native English writers from the two disciplines of applied linguistics and civil engineering. The study aimed at seeing whether language and discipline influenced the frequency of occurrence of metadiscourse elements in research articles. To this end, a sample of 120...

متن کامل

A new approach to the analysis and annotation of speech and prosody based on computerized cross-linguistic corpora

In the present paper, corpus linguistics becomes a valuable methodological tool for cross-linguistic research on speech and prosody. The inherent complexity of speech analysis and prosodic annotation increases when the object of study is a longitudinal computerized corpus of native and nonnative varieties of English. The lack of generally accepted prosodic transcription systems adds further dif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2016